video compression
- Europe > United Kingdom > England > Bristol (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > Canada > Ontario > Toronto (0.28)
- North America > Canada > Ontario > Hamilton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
General response (R1, R2, R3)
Dear Reviewers, we thank you for taking the time to provide valuable feedback. Below we address the main issues raised. Its performance depends on our ability to predict the distribution over future frames with low entropy. We will emphasize these aspects more in a revised version. RNNs to model dynamics in the latent space.
Appendix: ScalableNeuralVideoRepresentations withLearnablePositionalFeatures
We train the network by adopting mean-squared error as our loss function and using the AdamW optimizer [27]withalearning rateof0.01. Specifically,wefirstapply a2-layer MLP ontheoutput ofthepositional encoding layer,and then we stack 5NeRV blocks with upscale factors 5, 3, 2, 2, 2, respectively. To be specific, on the UVG-HD benchmark, we set the number of levels as 15, the number of features per level as 2, the maximum entries per level as224, and the coarsest resolution as 16. Table 7: Decoding time ofcoordinate-based representations measured with FPS (higher isbetter).